Skip to content

Testing on GH CI #1076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: 16/edge
Choose a base branch
from
Draft

Conversation

taurus-forever
Copy link
Contributor

Testing on GH CI...

@taurus-forever taurus-forever force-pushed the alutay/is_restart_pending_test_ci branch 2 times, most recently from 3b97355 to 3381fb3 Compare August 4, 2025 01:08
Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 34.61538% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.66%. Comparing base (ee02d5a) to head (269ed18).
⚠️ Report is 1 commits behind head on 16/edge.

Files with missing lines Patch % Lines
src/charm.py 41.37% 15 Missing and 2 partials ⚠️
src/cluster.py 28.57% 15 Missing ⚠️
src/relations/async_replication.py 0.00% 2 Missing ⚠️

❌ Your project check has failed because the head coverage (62.66%) is below the target coverage (70.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           16/edge    #1076      +/-   ##
===========================================
- Coverage    64.87%   62.66%   -2.21%     
===========================================
  Files           17       17              
  Lines         4270     4272       +2     
  Branches       656      636      -20     
===========================================
- Hits          2770     2677      -93     
- Misses        1333     1440     +107     
+ Partials       167      155      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@taurus-forever taurus-forever force-pushed the alutay/is_restart_pending_test_ci branch 3 times, most recently from 1837b9c to 809daa0 Compare August 5, 2025 11:04
The previous is_restart_pending() waited for long due to the Patroni's
loop_wait default value (10 seconds), which tells how much time
Patroni will wait before checking the configuration file again to reload it.

Instead of checking PostgreSQL pending_restart from pg_settings,
let's check Patroni API pending_restart=True flag.
The current Patroni 3.2.2 has wired/flickering  behaviour:
it temporary flag pending_restart=True on many changes to REST API,
which is gone within a second but long enough to be cougth by charm.
Sleepping a bit is a necessary evil, until Patroni 3.3.0 upgrade.

The previous code sleept for 15 seconds waiting for pg_settings update.

Also, the unnecessary restarts could be triggered by missmatch of
Patroni config file and in-memory changes coming from REST API,
e.g. the slots were undefined in yaml file but set as an empty JSON {} => None.
Updating the default template to match the default API PATCHes and avoid restarts.
On topology observer event, the primary unit used to loose Primarly label.
Also:
* use commong logger everywhere
* and add several useful log messaged (e.g. DB connection)
* remove no longer necessary debug 'Init class PostgreSQL'
* align Patroni API requests style everhywhere
* add Patroni API duration to debug logs
The list of IPs were randomly sorted causing unnecessary Partroni
configuration re-generation with following Patroni restart/reload.
…hanged

Those defers are necessary to support scale-up/scale-down during the refresh,
while they have significalty slowdown PostgreSQL 16 bootstrap (and other
daily related mainteinance tasks, like re-scaling, full node reboot/recovery, etc).

Muting them for now with the proper documentation record to
forbid rescaling during the refresh, untli we minimise amount of defers in PG16.
Throw and warning for us to recall this promiss.
The current PG16 logic relies on Juju update-status or on_topology_change
observer events, while in some cases we start Patroni without the Observer,
causing a long waiting story till the next update-status arrives.
It is hard (impossible?) to catch the Juju Primary label
manipulations from Juju debug-log. Logging it simplifyies troubleshooting.
We had to wait 30 seconds in case of lack of connection which is unnecessary long.

Also, add details for the reason of failed connection Retry/CannotConnect.
It speedups the sinble unit app deployments.
@taurus-forever taurus-forever force-pushed the alutay/is_restart_pending_test_ci branch from 809daa0 to 269ed18 Compare August 14, 2025 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant